Stabilized Nearest Neighbor Classifier and Its Statistical Properties
نویسندگان
چکیده
Stability has been of a great concern in statistics: similar statistical conclusions should be drawn based on different data sampled from the same population. In this article, we introduce a general measure of classification instability (CIS) to capture the sampling variability of the predictions made by a classification procedure. The minimax rate of CIS is established for general plug-in classifiers. As a concrete example, we consider the stability of the nearest neighbor classifier. In particular, we derive an asymptotically equivalent form for the CIS of a weighted nearest neighbor classifier. This allows us to develop a novel stabilized nearest neighbor classifier which well balances the trade-off between classification accuracy and stability. The resulting classification procedure is shown to possess the minimax optimal rates in both excess risk and CIS. Extensive experiments demonstrate a significant improvement of CIS over existing nearest neighbor classifiers at an ignorable cost of classification accuracy.
منابع مشابه
Nearest Neighbor Classifier with Optimal Stability
Outline Motivations Classification instability and its minimax properties Stabilized nearest neighbor classifier Experiments Sun, Wei (Purdue) Nearest Neighbor Classifier with Optimal Stability Motivation Begley and Ellis (Nature, 2012) found that 47/53 medical research papers on the subject of cancer were irreproducible. Motivation In the paper " Stability " (Yu, 2013), Bin Yu wrote ...reprodu...
متن کاملEFFECT OF THE NEXT-NEAREST NEIGHBOR INTERACTION ON THE ORDER-DISORDER PHASE TRANSITION
In this work, one and two-dimensional lattices are studied theoretically by a statistical mechanical approach. The nearest and next-nearest neighbor interactions are both taken into account, and the approximate thermodynamic properties of the lattices are calculated. The results of our calculations show that: (1) even though the next-nearest neighbor interaction may have an insignificant ef...
متن کاملK-D Decision Tree: An Accelerated and Memory Efficient Nearest Neighbor Classifier
This paper presents a novel Nearest Neighbor (NN) classifier. NN classification is a well studied method for pattern classification having the following properties; * it performs maximum-margin classification and achieves less than the twice of ideal Bayesian error, * it does not require the knowledge on pattern distributions, kernel functions or base classifiers, and * it can naturally be appl...
متن کاملImproving the Behavior of the Nearest Neighbor Classifier against Noisy Data with Feature Weighting Schemes
The Nearest Neighbor rule is one of the most successful classifiers in machine learning but it is very sensitive to noisy data, which may cause its performance to deteriorate. This contribution proposes a new feature weighting classifier that tries to reduce the influence of noisy features. The computation of the weights is based on combining imputation methods and non-parametrical statistical ...
متن کاملUsing Weighted Nearest Neighbor to Benefit from Unlabeled Data
The development of data-mining applications such as textclassification and molecular profiling has shown the need for machine learning algorithms that can benefit from both labeled and unlabeled data, where often the unlabeled examples greatly outnumber the labeled examples. In this paper we present a two-stage classifier that improves its predictive accuracy by making use of the available unla...
متن کامل